Data Stream Clustering Algorithms: A Review

نویسندگان

  • Maryam Mousavi
  • Azuraliza Abu Bakar
  • Mohammadmahdi Vakilian
چکیده

Data stream mining has become a research area of some interest in recent years. The key challenge in data stream mining is extracting valuable knowledge in real time from a massive, continuous, dynamic data stream in only a single scan. Clustering is an efficient tool to overcome this problem. Data stream clustering can be applied in various fields such as financial transactions, telephone records, sensor network monitoring, telecommunications, website analysis, weather monitoring, and e-business. Data stream clustering presents some challenges; it needs to be done in a short time frame with limited memory using a single-scan process. Moreover, because data stream outliers are hidden, clustering algorithms must be able to detect outliers and noise. In addition, the algorithms have to handle concept drift and detect arbitrary shaped clusters. Several algorithms have been proposed to overcome these challenges. This paper presents a review of five types of data stream clustering approaches: partitioning, hierarchical, density-based, grid-based and model-based. The different data stream clustering algorithms in the literature by considering their respective advantages and disadvantages are discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Stream Clustering in Internet of Things

According to Gartner, by year 2020, revenue generated from Internet of Thing’s (IOT) products and services will exceed $300 billion. When discussing about IOT, the thing that keeps visiting our mind, is the huge amount of data stream that is getting generated by the use of IOT applications. The data stream is different from the traditional data as only one scan is possible for mining because of...

متن کامل

بررسی مشکلات الگوریتم خوشه بندی DBSCAN و مروری بر بهبودهای ارائه‌شده برای آن

Clustering is an important knowledge discovery technique in the database. Density-based clustering algorithms are one of the main methods for clustering in data mining. These algorithms have some special features including being independent from the shape of the clusters, highly understandable and ease of use. DBSCAN is a base algorithm for density-based clustering algorithms. DBSCAN is able to...

متن کامل

Optimization of sediment rating curve coefficients using evolutionary algorithms and unsupervised artificial neural network

Sediment rating curve (SRC) is a conventional and a common regression model in estimating suspended sediment load (SSL) of flow discharge. However, in most cases the data log-transformation in SRC models causing a bias which underestimates SSL prediction. In this study, using the daily stream flow and suspended sediment load data from Shalman hydrometric station on Shalmanroud River, Guilan Pro...

متن کامل

Clustering Large Datasets Using Data Stream Clustering Techniques

Abstract. Unsupervised identification of groups in large data sets is important for many machine learning and knowledge discovery applications. Conventional clustering approaches (kmeans, hierarchical clustering, etc.) typically do not scale well for very large data sets. In recent years, data stream clustering algorithms have been proposed which can deal efficiently with potentially unbounded ...

متن کامل

Stream ciphers and the eSTREAM project

Stream ciphers are an important class of symmetric cryptographic algorithms. The eSTREAM project contributed significantly to the recent increase of activity in this field. In this paper, we present a survey of the eSTREAM project. We also review recent time/memory/data and time/memory/key trade-offs relevant for the generic attacks on stream ciphers.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015